Skip to content

Conversation

lukemanley
Copy link
Member

import pandas as pd

data = [f"i-{i:05}" for i in range(100_000)]
dtype = "string[pyarrow_numpy]"

idx1 = pd.Index(data, dtype=dtype)
idx2 = pd.Index(data[1:], dtype=dtype)

# the is_unique call at the end is cached in this PR
%timeit idx1.join(idx2, how="outer").is_unique

# 59.1 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  -> main
# 41.9 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   -> PR

@lukemanley lukemanley added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Jan 23, 2024
@lukemanley lukemanley added this to the 3.0 milestone Jan 23, 2024
@mroeschke mroeschke merged commit 622f31c into pandas-dev:main Jan 24, 2024
@mroeschke
Copy link
Member

Thanks @lukemanley

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…dev#57023)

* Index.join result name

* whatsnew

* update test

* Index._wrap_join_result to maintain cached attributes if possible

* Index._wrap_join_result to maintain cached attributes if possible

* whatsnew

* allow indexers to be None

* gh ref

* rename variables for clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants